Sequencing and Raw Sequence Data Quality Control ◾ 5
for PCR DNA strand synthesis and barcode sequence for indexing the sample DNA. This
allows multiple samples (multiplexing) to be sequenced in a single run; the DNA fragments
of each sample will have a unique barcode. Later, after sequencing, the sample sequences
can be separated in the analysis by demultiplexing. In the sequencing of some application
like gene expression (RNA-Seq) and epigenetics, an enrichment step is usually included
to amplify or to separate only the targeted sequences. In RNA application, enrichment is
performed to separate mRNA from the other types of RNA. In epigenetics, the genomic
regions where protein interaction is taking place can also be enriched. The enrichment is
usually performed with PCR, but there are other means as well. The library preparation of
the DNA/RNA is similar for all NGS technologies but the sequencing process is different
from one to another. The sequences produced by the NGS technologies range between 75
and 400 base pairs (bp) in length. These sequences are called short reads. In general, short-
read sequencing (SRS) can either be a single-end sequencing, which sequences the forward
strand only, or paired-end sequencing, which sequences both forward and reverse strands.
The latter reduces the chance of making basal error in the resulted sequence. The DNA or
RNA reads consist of the four nucleobase characters A, C, G, and T. However, the sequence
may also include N for an unresolved base.
1.2.2.1 Roche 454 Technology
Roche 454 pioneered the NGS, when 454 was used to sequence the whole genome of
Mycoplasma genitalium in 2006 [3]. The 454 technology uses pyrosequencing, which
depends on the sequential addition and incorporation of nucleotides in the DNA template.
The signal of the added nucleotide is quantitated by conversion of released pyrophosphate
into a light signal in the real time. The pyrosequencing is based on a series of enzymatic
reactions that lead to the DNA synthesis and release of the inorganic pyrophosphate every
time a nucleotide is incorporated by polymerase in the DNA chain. The density of the light
generated by the reactions can be detected by a charge-coupled device camera. The order
of the nucleotides in the DNA template is determined by quantitating the light density.
In the pyrosequencing, the DNA is fragmented and denatured into ssDNA. Two adap-
tors (A and B) are ligated to both ends of the fragments. Beads of soluble particles with
single-stranded primers complementing adaptor A are added to the reactions. The adaptor
A attached to ssDNA template complements the bead primers, which initiate the synthe-
sis of the complementary strand. This step can be repeated several times for enrichment
(PCR). Then, the beads with the ssDNA templates are placed into wells where sequencing
takes place. A primer is added to complement the adaptor B and to initiate the addition of
new nucleotides to the complementary strand. However, this time, known nucleotides are
added. Every time a nucleotide is incorporated into the complementary strand, a hydroxyl
group of the last nucleotide reacts with the alpha phosphate of the incorporated nucleo-
tide releasing a two-phosphate compound called the inorganic pyrophosphate (PPI). The
PPI contains a high amount of energy that converts the adenosine monophosphate (AMP)
into adenosine triphosphate (ATP) with the help of ammonium persulfate (APS) and sul-
furylase which are added to the reaction. Finally, luciferin and luciferase are added to
the ATP so the luciferin forms light. Every time a nucleotide is added, a light is emitted